🔗 Add script to automatically make new meetings pages#2401
🔗 Add script to automatically make new meetings pages#2401JFWooten4 wants to merge 21 commits intostellar:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a Python helper to generate new Stellar developer meeting pages from a YouTube URL/ID by downloading captions, producing a cleaned transcript, and scaffolding an MDX page (with optional summary/resources extraction).
Changes:
- Add
meetings/new-meeting.pyto fetch YouTube captions (viayt_dlp), clean/punctuate them, and generate an MDX meeting page. - Add
meetings/README.mddocumenting setup and usage of the script. - Update site/build hygiene: exclude
README.mdfrom meetings blog ingestion and ignore local Python/cookies artifacts.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
meetings/new-meeting.py |
New generator script for meeting MDX + transcript/summary/resources extraction. |
meetings/README.md |
Usage and setup instructions for the generator script. |
docusaurus.config.ts |
Excludes README.md files from the meetings blog content glob. |
.gitignore |
Ignores *.pyc and **/cookies.txt to avoid committing local artifacts/secrets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I really like the new style of having the meeting author commit the page updates with extensive notes based on their immediate memory. Each section of news in Kaan's #2388 could easily be an independent page reference for significant network developments. The styling encoded in the script currently best adapts to the historic CAP discussions like how |
a SEP's author -> a SEP's author CAP's themselves -> CAPs themselves CAP's are -> CAPs are SEP's include -> SEPs include L99 not a cat
…into 2145-pt-8
|
I'm certainly interested by the script to automate some of this work! Can you run the script and commit an "example" meeting notes file so I can take a look at what the output is? The changes in |
This is the raw output with no effort put into tags or authors. You can see the weakest point is the key points extraction. I haven't found a good command-line AI summary package yet. Overall, it gives you more deference to good human writing or at least higher-context transcript analysis. I ran it without any cookies config since one meeting won't hit anywhere near standard limits. But when you're exporting dozens of them over and over for testing... well, that's how I got my IP blocked from unauthenticated YouTube for a couple weeks. Totally fine to run a few without the export script. But it's easy enough to add when you need it.
|
Here it is run on the last meeting from the YouTube prior to the current docs start. I explained the cookies logic much more in-depth here. TL;DR is that YouTube rate limits your data calls, and the best way to get speed up is to export your login token. It's excluded because anyone with the cookies file can impersonate you in the browser, at least while it's unexpired. For large batch actions that require the file, this ensures nobody accidentally commits their data. I did use this in #1087 to do batch action configs. That was also where I wrote it, so that branch has a lot more human touch while I set up the framework for this script. While it maintains some of that bulk flexibility like the command-line interface, the main idea is to run this when there is a new Dev meeting upload. It goes per-file and on-demand, so it's a simple little executable once the videos are online. You can run them twice for the new double-chat format, and then just combine the files. |
Warning
This relies on #2390 and #2361 which are blocking
This simple Python script creates a new meetings page with just a URL / video ID from YouTube. All new meetings use YouTube, so I didn't add support for older methods in #2362 or #2363.
It has quite some extra bells and whistles, which I think would be good to add into the repo on
mainbefore trimming down. One of which is the option to output the transcript into a text file. I've thrown a flag to allow that if others want to test presentation ideas on the source materials.I also set up more structure with a
Key PointsandResourcessection by default. All the past meetings follow this flow, where the notes highlight main topics and then refer to resources. Usually, those resources are links to CAPs - more to come in integrating those into tags later.In the past it would repeat links or ideas redundantly. This new flow encourages a clear list of where viewers can learn more.
As for the transcript logic, I believe this functionality will be invaluable for SEO. It will be much easier to search for developer chats when it follows the cleaned encoding, which presents topics in writing. Think quoted searches or even intra-docs CAP queries, which can now easily reference source materials from Core Devs and the Community.